190
14
The Nature of Living Things
than that of the protein (transcription factor)-based regulation; this is borne out by
the length of “noncoding” DNA (proportional to r∝r) increasing quadratically with the length of
coding DNA (proportional to g∝g) above the 10 Mb threshold. It begs the question of why protein-
based regulation is used at all, even in prokaryotes, if the RNA-based system is
effective and much less costly, but our present knowledge of RNA-based regulation
seems to be too incomplete to allow this question to be satisfactorily addressed.
DNA Base Composition Heterogeneity
The base composition of DNA is very heterogeneous, 30 which makes stochastic mod-
elling of the sequence (e.g., as a Markov chain) very problematical. This patchiness
or blockiness is presumed to arise from the processes taking place when DNA is
replicated in mitosis and meiosis (Sect. 14.4.1). It has turned out to be very use-
ful for characterizing variations between individual human genomes. Much of the
human genome is constituted from “haplotype blocks”, regions of about 10 Superscript 4104–10 Superscript 5105
nucleotides in which a few (less than 10<10; the average number is 5.5) sequence variants are
said to account for nearly all of the variation in the world human population. The
haplotype “map” is simply a list of the variants for each block. 31
Haplotypes are essentially long stretches of DNA characterized by a small number
of single-nucleotide polymorphisms (SNPs—pronounced “snips”); that is, mutated
nucleotides. There is an average of about 1 SNP per thousand base pairs in the human
genome; thus, if they were uncorrelated, in a typical 50 000 base pair haplotype block
there would be about 2 Superscript 50250 (or 4 Superscript 50450, depending on whether we are interested in what
the base is mutated to) variants—far more variation than is actually found. Hence,
the pattern of SNPs evinces extremely strong constraint; that is, the occurrences
of individual SNPs are strongly correlated with each other. There is considerable
current interest in trying to correlate haplotype variants with disease, or propensity
to disease (Sect. 26.3). 32
One notes that as much as 98% of the human genome may be identical with that
of the ape; one could equally well state that there is more genetic difference between
man and woman than between man and ape. To actually derive the vast phenotypic
differences between the two from their genomes appears to be as vain a hope as
solving the Schrödinger equation for even a single gene.
As an information-bearing symbolic sequence, the genome is unusual in that it
can operate on itself. The most striking example is furnished by retrotransposons
(i.e., transposable elements, whose existence was first proposed by McClintock in
1950). These gene segments inter alia encode a reverse transcriptase enzyme, which
facilitates the making of a DNA copy of the sequence. The duplicate sequence is
then inserted into the genome; the point of insertion may be remote from that of the
30 For example, Karlin and Brendel (1993).
31 See Terwilliger and Hiekkalinna (2006) for a critique of the International HapMap Project.
32 Another curiosity is that certain DNA sequences display extraordinarily long-range (10 Superscript 4104 base
pairs or more) correlations (see, e.g., Voss 1992).